Quantitative methods for identifying systematic polysemy classes
نویسندگان
چکیده
In this paper we report the results of four experiments conducted to extract lists of nouns that exhibit inherent polysemy from corpus data following semiautomatic and automatic procedures. We compare the methods used and the results obtained. We argue that quantitative methods can be used to distinguish different classes of polysemous nouns in the language on the basis of the variability of copredication contexts. I. MOTIVATION AND GOALS In this paper we examine nouns which exhibit systematic (or regular) polysemy, i.e. general sense alternations of the kind animal-food, where the same relation holds between the meanings for a series of lexical items in a language (such as chicken, rabbit, codfish, lamb, etc.) and is not particular to a single one (cf. [1] and subsequent work). The goal is twofold: to acquire nouns belonging to different polysemy alternations from corpora and group them in different classes based on the underlying nature of their systematic polysemy, which we assume not to be univocal. Specifically, we are interested to tell apart nouns which exhibit systematic polysemy because of their regular ability to, in a single occurrence, convey multiple aspects and thus denote entities having a complex type (for instance, physicalobject-informationobject, e.g., ‘book’, event-food, e.g., ‘lunch’) from nouns usually presenting a single aspect in an occurrence and whose systematic polysemy is more likely due to (more or less lexicalized) coercion effects triggered by the linguistic and/or the pragmatic context (animal-food, e.g., ‘chicken’, containercontainee, e.g., ‘bottle’). Previous work has shown that copredication,1 the usual test employed in the literature to distinguish the first kind of nouns (variously called “complex or dotted type nouns” [3], “nouns with facets” [4], “dual aspect nouns” [2] “inherently polysemous nouns” [5]) from other kinds of systematically polysemous nouns, including “selectional polysemy” [5] and “pseudo-dots” [3] such as animal-food or containercontainee, is not sufficient because copredication is also possible, albeit less frequent, with expressions which exhibit polysemy due to coercion effects. This is the case of the noun sandwich in such contexts as Sam grabbed and finished the sandwich in one minute, in which sandwich is predicated both 1Copredication can be formally defined as a “grammatical construction in which two predicates jointly apply to the same argument” [2]. We focus here on copredication contexts in which the two predicates select for disjoint types. An example is They burned the controversial books, where the predicate burned selects for the physicalobject aspect (or sense) of the argument books while controversial selects for the informationobject aspect. as a physical object and as the event of eating it. In an earlier work [6], we proposed that variability of pair of predicates in copredication contexts is the key to distinguish inherently polysemous nouns from nouns subject to coercion. According to this hypothesis, high variability of pair of predicates in copredication contexts is evidence of inherently polysemous nouns, while low variability points to nouns subject to coercion effects. Our work has also shown that the bottleneck of a quantitative methodology meant to distinguish different classes of polysemous nouns is the identification of predicates selecting for the different aspects of the nouns with high precision. Particularly, manual selection has proved to be very time consuming. In this paper, we report the result of experiments we run to evaluate the whole methodology developed in [6], and to test the possibility to expand it in a way to automatize the selection of predicates exploiting distributional methods. Specifically, Section II-A outlines the methodology we adopted in [6]. Section II-B introduces the distributional method for selecting predicates we used for two experiments. Related work is discussed in III. Section IV presents the two main experiments and an evaluation procedure used to compare them with two baselines, the manual experiment of [6] and another one based on Lexit, a lexical resource for Italian. Section V discusses the results, and Section VI draws conclusions and offers hints for further work.
منابع مشابه
Semi-productive Polysemy and Sense Extension
In this paper we discuss various aspects of systematic or conventional polysemy and their formal treatment within an implemented constraint based approach to linguistic representation. We distinguish between two classes of systematic polysemy: constructional polysemy, where a single sense assigned to a lexical entry is contextually specialised, and sense extension, which predictably relates two...
متن کاملA Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes
We present an unsupervised method for inducing verb classes from verb uses in gigaword corpora. Our method consists of two clustering steps: verb-specific semantic frames are first induced by clustering verb uses in a corpus and then verb classes are induced by clustering these frames. By taking this step-wise approach, we can not only generate verb classes based on a massive amount of verb use...
متن کاملSemi-automatic Induction of Systematic Polysemy from WordNet
This paper describes a semi-automatic method of inducing underspecified semantic classes from WordNet verbs and nouns. An underspecified semantic class is an abstract semantic class which encodes systematic polysem~f, a set of word senses that are related in systematic and predictable ways. We show the usefulness of the induced classes in the semantic interpretations and contextual inferences o...
متن کاملModeling Regular Polysemy: A Study on the Semantic Classification of Catalan Adjectives
We present a study on the automatic acquisition of semantic classes for Catalan adjectives from distributional and morphological information, with particular emphasis on polysemous adjectives. The aim is to distinguish and characterize broad classes, such as qualitative (gran ‘big’) and relational (pulmonar ‘pulmonary’) adjectives, as well as to identify polysemous adjectives such as econòmic (...
متن کاملLexicalised Systematic Polysemy in WordNet
This paper describes an attempt to gain more insight into the mechanisms that underlie lexicalised systematic polysemy. This phenomenon is interpreted as systematic sense combinations that are valid for more than one word. The hierarchical structure of WordNet is exploited to create a working definition of systematic polysemy and extract polysemic patterns at a level of generalisation that allo...
متن کامل